-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Bug in loc did not change dtype when complete column was assigned #37749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexing.py
Outdated
@@ -1550,6 +1551,13 @@ def _setitem_with_indexer(self, indexer, value): | |||
val = list(value.values()) if isinstance(value, dict) else value | |||
blk = self.obj._mgr.blocks[0] | |||
take_split_path = not blk._can_hold_element(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be the else condtiion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, value can be anything from int, float to numpy array. I think this check is only necessary if we have Series or DataFrame. Maybe with an array?
@@ -298,6 +299,36 @@ def test_iloc_setitem_bool_indexer(self, klass): | |||
expected = DataFrame({"flag": ["x", "y", "z"], "value": [2, 3, 4]}) | |||
tm.assert_frame_equal(df, expected) | |||
|
|||
def test_setitem_complete_columns_different_dtypes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an int64 column in the original frame (that is not selected).
also can you test with Int64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the column.
It seems like astype does not work for Int64
if the input is from object dtype. Should I add a completly new test or is there a trick I am not aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's right, you can .astype('int64').astype(dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much, that is pretty obvious. Parametrized the test now
expected = DataFrame({"A": ["a", "b"], "B": [1, 2], "C": [3, 4]}, dtype="int64") | ||
tm.assert_frame_equal(df, expected) | ||
|
||
def test_setitem_single_column_as_series_different_dtype(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about (add to the original frame) and test Int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/frame/indexing/test_setitem.py � pandas/tests/indexing/test_iloc.py
@@ -289,6 +289,27 @@ def test_setitem_periodindex(self): | |||
assert isinstance(rs.index, PeriodIndex) | |||
tm.assert_index_equal(rs.index, rng) | |||
|
|||
@pytest.mark.parametrize("klass", [list, np.array]) | |||
def test_iloc_setitem_bool_indexer(self, klass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test name is good, belongs in tests.indexing.test_iloc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/tests/indexing/test_iloc.py
Outdated
) | ||
expected = DataFrame( | ||
{ | ||
"date0": [to_datetime("2015-01-01"), to_datetime("2016-01-01")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use Timestamp instead of to_datetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
The case described in #37593 fails now again, because split path can not handle Series assignment into a single cell. Will look into this further tomorrow |
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/frame/indexing/test_setitem.py
Puuuuh, cc @jbrockmendel
runs now through
The case is pretty weird with setting a Series into a single cell. Any idea how we could accomplish this within |
so ive mentioned a couple of times an upcoming PR that will make all DataFrame cases go through split_path. To make that work there are two kludgy checks I have to do near the top of the method (also non-kludges #37931 and #37932):
where I've defined
Looks like you're dealing with roughly the same problem that drove me to put in the double is_integer checks. |
It is exactly the same problem. If there is not elegant way, we could check if we have this case before setting split path to True. But this makes only sense if your pr does not arrive before 1.2 and this could go in |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/frame/indexing/test_setitem.py � pandas/tests/indexing/test_iloc.py � pandas/tests/indexing/test_loc.py
@jbrockmendel Is your PR with always split path still planned? |
status here? |
Depends, this sends a few more code paths down the split path in indexing, but I don't know if @jbrockmendel has something other in mind here? |
� Conflicts: � doc/source/whatsnew/v1.3.0.rst � pandas/core/indexing.py � pandas/tests/frame/indexing/test_setitem.py � pandas/tests/indexing/test_iloc.py
# GH#20635 | ||
ser = Series(["3", "4"], name="A") | ||
ser.loc[:] = ser.astype("int64").astype(dtype) | ||
expected = Series([3, 4], name="A", dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is doing the opposite of #39163. did we decide to revert part or all of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since yours is significantly newer I am fine with „closing“ this. Would check if some of the issues are fixed
@phofl what's the status on this? |
@phofl closing as stale. reopen when ready |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff